Scheduling  Constrained-Deadline  Parallel  Tasks  on  Two-type 

Heterogeneous  Multiprocessors 


Bjorn  Andersson 

Carnegie  Mellon  University 


Gurulingesh  Raravi 

Xerox  Research  Centre  India 


Abstract — Consider  the  problem  of  scheduling  a  taskset  on 
a  multiprocessor  to  meet  all  deadlines.  Assume  (i)  constrained- 
deadline  sporadic  tasks,  i.e.,  a  task  generates  a  sequence  of  jobs 
and  the  deadline  of  a  job  is  no  greater  than  the  minimum 
inter-arrival  time  of  the  task  that  generates  the  job,  (ii)  stage- 
parallelism,  i.e.,  a  task  comprises  one  or  more  stages  with  a 
stage  comprising  one  or  many  segments  so  that  segments  in  the 
same  stage  are  allowed  to  execute  in  parallel  and  a  segment  is 
allowed  to  execute  only  if  all  segments  of  the  previous  stage  have 
finished,  (iii)  two-type  heterogeneous  multiprocessor  platform, 
i.e.,  there  are  processors  of  two  types,  type-1  and  type-2,  and 
for  each  task,  there  is  a  specification  of  its  execution  speed  on 
a  type-1  processor  and  on  a  type-2  processor,  and  (Iv)  intra¬ 
type  migration,  i.e.,  a  job  can  migrate  between  processors  of 
the  same  type  but  for  a  task,  all  jobs  of  this  task  must  execute 
on  the  same  processor  type.  We  present  an  algorithm  for  this 
problem;  it  assigns  each  task  to  a  processor  type  and  then 
schedules  tasks  on  processors  of  each  type  with  global-Earllest- 
Deadllne-First.  Its  has  pseudo-polynomial  time  complexity  and  in 
our  evaluation  with  randomly-generated  tasksets  with  systems  up 
to  256  tasks  and  256  processors,  the  algorithm  never  took  more 
than  2.5  seconds  to  finish.  We  show  that  the  speedup  factor  of 
the  algorithm  is  most  5.  This  is  the  first  algorithm  for  scheduling 
parallel  real-time  tasks  on  a  heterogeneous  multiprocessor  with 
provably  good  performance. 

1.  Introduction 

Software  systems  are  expected  to  do  more  with  less,  i.e., 
providing  more  functionality  and  greater  performance  with 
lower  size,  weight,  and  power  consumption.  The  research 
community  has  taken  a  great  interest  in  developing  methods 
that  provide  foundations  for  doing  so  while  ensuring,  before 
run-time,  that  the  software  system  can  respond,  at  run-time, 
to  certain  events  within  pre-specified  time  constraints  (dead¬ 
lines).  Such  foundations  include  algorithms  for  scheduling 
tasks  on  heterogeneous  multiprocessors.  These  algorithms 
are  useful  because  heterogeneous  multiprocessors  typically 
provide  more  processing  power  per  watt.  Other  foundations 
include  algorithms  for  scheduling  tasks  that  can  execute  in 
parallel  on  multiprocessors.  These  algorithms  are  relevant  for 
computations  responding  to  events  that  have  so  tight  deadlines 
that  even  if  a  computation  is  executed  on  a  system  with  no 
other  computations  present,  the  only  way  for  the  computation 
to  meet  its  deadline  is  to  perform  execution  in  parallel. 

A  computer  platform  is  a  homogeneous  multiprocessor 
(sometimes  called  identical  multiprocessor)  if  the  execution 
speed  of  all  tasks  is  the  same  on  all  processors.  Conversely,  a 
computer  platform  is  a  heterogeneous  multiprocessor  (some¬ 
times  called  unrelated  multiprocessor)  if  the  execution  speed 
of  a  task  depends  on  both  the  processor  and  the  task.  A 


heterogeneous  multiprocessor  is  two-type  if  it  has  two  types  of 
processors  (a.k.a.  two-type  platform).  Analogously,  a  hetero¬ 
geneous  multiprocessor  is  t-type  if  it  has  t  types  of  processors 
(a.k.a.  t-type  platform).  For  two-type  platforms,  the  problem 
of  assigning  tasks  to  processors  is  NP-hard  in  the  strong  sense 
and  the  problem  of  assigning  tasks  to  processor  types  is  NP- 
hard  [1].  For  t-type  platforms,  both  the  problems  are  NP-Hard 
in  the  strong  sense  [2],  [3].  Consequently,  the  research  commu¬ 
nity  has  developed  approximation  algorithms  (i.e.,  algorithms 
with  finite  speedup  factors)  for  assigning  tasks  to  processors 
and  to  processor  types  [1]-[14]  on  such  platforms. 

Related  work.  The  algorithms  for  assigning  implicit- 
deadline  sporadic  tasks  (i.e.,  a  task  generates  a  sequence  of 
jobs  and  a  job  has  a  deadline  that  is  equal  to  the  minimum 
inter-arrival  time  of  the  task  that  generates  the  job)  to  pro¬ 
cessors  and  to  processor  types  for  two-type  platforms  [1], 
[4]-[7]  have  lower  time  complexity  than  the  algorithms  for 
t-type  platforms  [2],  [3],  [8]-[13]  while  maintaining  their 
performance  bound.  In  addition,  an  algorithm  for  scheduling 
arbitrary-deadline  sporadic  tasks  (i.e.,  a  task  generates  a 
sequence  of  jobs  and  a  job  has  a  deadline  that  may  be  less  than 
or  greater  than  or  equal  to  the  minimum  inter-arrival  time  of 
the  task  that  generates  the  job)  on  t-type  platforms  is  known 
as  well  [14].  However,  none  of  them  [1]-[14]  support  parallel 
tasks.  The  research  community  has  also  presented  algorithms 
with  proven  speedup  factor  for  scheduling  parallel  tasks  on 
homogeneous  multiprocessors  [15]-[21].  Further,  there  are 
other  algorithms  [22]-[31]  with  no  proven  speedup  factor  for 
scheduling  parallel  tasks  on  homogeneous  multiprocessors  — 
some  of  them  [22]-[27]  are  for  constrained-deadline  sporadic 
tasks  (i.e.,  a  task  generates  a  sequence  of  jobs  and  a  job 
has  a  deadline  that  may  be  less  than  or  equal  to  the  min¬ 
imum  inter-arrival  time  of  the  task  that  generates  the  job) 
and  the  others  [28]-[31]  are  for  implicit-deadline  sporadic 
tasks.  Unfortunately,  none  of  these  works  [15]-[31]  support 
heterogeneous  multiprocessors  (and  moreover  most  of  these 
algorithms  [22]-[31]  have  no  proven  speedup  factors).  A  work 
by  Holenderski  et.  al.  [32]  comes  closest  to  ours  as  it  also  deals 
with  the  problem  of  scheduling  parallel  tasks  on  heterogeneous 
multiprocessors.  However,  the  approach  presented  in  [32]  has 
no  proven  speedup  factor. 

This  research.  In  this  paper,  we  present  an  algorithm  for 
scheduling  tasks  on  a  two-type  heterogeneous  multiprocessor 
with  a  finite  speedup  factor.  Our  approach  assigns  each  task 
to  a  processor  type  and  then  uses  global-Earliest-Deadline- 
First  (gEDF)  on  the  processors  of  each  type  to  schedule  the 
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stage:  1  stage:  2  stage:  j  stage:  nS; 


segments:  nseg|  ^  segments:  nseg|  2  segments:  nsegjj  segments:  nseg;  ^3. 

Fig.  1:  The  parallel  task  model  studied  in  this  paper. 


respective  tasks.  We  show  that  our  algorithm  has  pseudo¬ 
polynomial  time  complexity  and  a  speedup  factor  at  most  5. 
We  study  the  constrained-deadline  sporadic  task  model  and 
consider  parallelism  with  stages  (i.e.,  a  task  is  described  with 
one  or  more  stages  with  each  stage  comprising  one  or  many 
segments  such  that  segments  in  the  same  stage  are  allowed  to 
execute  in  parallel  but  a  segment  is  only  allowed  to  execute 
if  all  segments  of  the  previous  stage  have  finished  execution). 
This  work  makes  the  following  contribution;  it  presents  the 
first  algorithm  for  scheduling  parallel  real-time  tasks  on  a 
heterogeneous  multiprocessor  with  a  proven  speedup  factor. 

Organization  of  the  paper.  The  rest  of  this  paper  is  orga¬ 
nized  as  follows.  Section  II  states  the  system  model.  Section  III 
lists  previous  results  for  parallel  scheduling  on  homogeneous 
multiprocessors  and  also  proves  new  lemmas  that  we  use  later 
in  the  paper.  Section  IV  presents  our  new  algorithm  for  two- 
type  heterogeneous  multiprocessors  and  proves  its  speedup 
factor  and  time  complexity.  Section  V  presents  evaluation. 
Section  VI  concludes. 

II.  System  Model 

We  consider  the  problem  of  scheduling  a  set  r  of 
constrained-deadline  sporadic  tasks  on  a  two-type  heteroge¬ 
neous  multiprocessor  platform  If  comprising  toi  processors 
of  type-1  and  m2  processors  of  type-2.  A  task  Ti  G  t 
is  characterized  by  a  minimum  inter-arrival  time  Ti  and  a 
deadline  Di  such  that  Di  <  Ti.  Each  task  Ti  generates  a 
sequence  of  jobs,  with  the  first  job  arriving  at  any  time  and 
subsequent  jobs  arriving  at  least  Ti  time  units  apart. 

The  execution  of  a  task  Ti  is  described  by  ns^,  nsegj  p  and 
Cij  with  the  interpretation  that  a  job  of  has  ns^  stages 
with  stage  j  comprising  nsegj  j  segments  with  each  segment 
of  stage  j  having  execution  requirement  at  most  Cij  —  see 
Fig.  1.  A  segment  finishes  when  it  performs  a  number  of  units 
of  execution  equal  to  its  execution  requirement.  A  segment 
executing  contiguously  for  L  time  units  on  a  processor  of 
speed  s  performs  L  x  s  units  of  execution.  A  segment  of  a 
job  is  allowed  to  execute  only  if  all  segments  of  its  previous 
stage  have  finished.  A  job  finishes  when  all  segments  of  its 
last  stage  have  finished.  If  a  job  of  Ti  finishes  at  most  Di  time 
units  after  its  arrival,  then  it  meets  its  deadline. 

On  a  two-type  platform,  the  execution  speed  of  a  job 
depends  on  the  type  of  processor  on  which  it  executes.  Let  rl 


and  denote  the  execution  speeds  of  a  job  of  task  when 
it  executes  on  a  processor  of  type-1  and  type-2  respectively. 
We  now  define  terms  that  we  use  in  the  rest  of  the  paper. 

Definition  1  (Legal  jobset).  If  for  each  task  in  the  taskset  t, 
the  task  is  assigned  the  number  of  jobs  it  generates  and  each 
job  is  assigned  an  arrival  time  such  that  the  minimum  inter¬ 
arrival  time  constraint  is  satisfied  and  each  segment  of  a  job  is 
assigned  an  execution  requirement  such  that  the  upper  bound 
on  execution  requirement  of  a  segment  is  respected,  then  we 
say  that  the  resulting  jobset  is  a  legal  jobset  with  respect  to  r. 

Definition  2  (Intra-migrative  schedule).  A  schedule  is  intra- 
migrative  if  both  of  the  following  conditions  are  true:  (i)  jobs 
are  allowed  to  migrate  between  processors  of  the  same  type 
and  (ii)  for  each  task,  it  holds  that,  if  a  job  executes  on  a 
processor  of  one  type  then  all  other  jobs  of  this  task  execute 
on  processors  of  the  same  type. 

Definition  3  (Intra-migrative  feasible  taskset).  A  taskset  r  is 
intra-migrative  feasible  on  a  two-type  platform  If  if  for  each 
jobset  that  is  legal  with  respect  to  t  there  exists  an  intra- 
migrative  schedule  in  which  all  deadlines  are  met. 

Definition  4  (5'-Schedulable  task  set).  A  taskset  t  is  S- 
schedulable  on  a  two-type  platform  If  if  for  each  jobset  that  is 
legal  with  respect  to  t,  for  each  schedule  that  S  can  generate 
from  the  jobset,  it  holds  that  the  schedule  is  intra-migrative 
and  all  deadlines  are  met. 

Definition  5  (Speed  of  the  computing  platform).  If  IV  is  a 

two-type  platform  then  let  W  x  x  denote  a  two-type  platform 
where  the  speed  of  each  processor  is  multiplied  by  x. 

Definition  6  (Speedup  factor).  A  scheduler  S  has  a  speedup 
factor  SF s  if,  for  each  taskset  t,  for  each  two-type  platform 
If,  it  holds  that:  if  t  is  intra-migrative  feasible  on  If  then  t 
is  S-schedulable  on  If  x  SFg. 

In  order  to  simplify  our  discussion  in  the  rest  of  the 
paper,  we  rewrite  our  model  to  an  equivalent  formulation  as 
follows.  Instead  of  using  Cij,  r],  and  r‘f,  we  use  parameters 
C}j,  Cf  j,  and  s  selected  as  follows:  =  Ci^jlr]  and 

^ij!^  ~  We  let  s.t.  mean  such  that  and  :  mean  it 

holds  that.  We  let  {a;|/(a;)}  denote  a  set  of  elements  so  that 
an  element  x  is  in  the  set  if  and  only  if  f{x)  is  true.  For 
convenience,  we  write  the  predicate  (Vf  >  0  :  x)  to  mean 
the  predicate  (Vf  s.t.  f  >  0  :  x).  For  convenience,  we  also 
define  DM  AX  =  maxT-.g^-  Di,  DMIN  =  minT-.g^-  Di,  and 
TMAX  =  maxi-.g,-  Ti. 

III.  SCHEDULABILITY  ANALYSIS  OF  PARALLEL  TASKS  ON 

A  Homogeneous  Multiprocessor 

There  is  no  optimal  online  algorithm  for  scheduling  sporadic 
tasks  on  a  homogeneous  multiprocessor  (even  for  tasks  without 
parallelism)  [33].  Therefore,  we  use  global-Earliest-Deadline- 
First  (gEDF)  scheduling  as  it  has  good  speedup  factor  [34].  An 
exact  schedulability  test  exists  for  gEDF  [35]  but  it  has  high 
time-complexity,  does  not  support  parallel  tasks,  and  requires 
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T  is  feasible  on  m  processors  of  speed  s  => 


Vt  >  0  :  E  ffdbf  (ri,  f,  1,  s)  <  m  X  t  X  s 

Ti^T 


(7) 


Fig.  2;  Previously  known  [17]  schedulability  analysis  for  gEDF  scheduling  of  parallel  tasks  on  a  homogeneous  multiprocessor. 
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Fig.  3:  New  expressions  we  will  use. 


task  parameters  be  integers.  Therefore,  in  this  work,  we  use  a 
sufficient  (not  exact)  schedulability  test  for  parallel  tasks. 

The  literature  offers  many  sufficient  schedulability  tests 
for  gEDF  for  tasks  that  are  not  parallel  —  see  for  example 
[36]-[38].  Of  particular  interest  is  [38]  which  offers  a 
schedulability  test  with  a  speedup  factor  two.  The  key  to 
its  good  performance  is  the  use  of  ffdbf  —  forced-forward 
demand-bound  function  —  for  describing  the  maximum 
amount  of  execution  a  task  can  demand  in  a  time  interval, 
rather  than  using  the  traditional  dbf  —  demand-bound 
function.  This  schedulability  test  [38]  states  that  if  there 
exists  a  a  such  that  a  is  at  least  as  large  as  the  density  of 
each  task  and  for  each  value  of  t  the  sum  of  ffdbf  of  tasks  is 
at  most  a  certain  value  then  the  taskset  is  gEDF-schedulable. 
Later  work  [17]  extended  this  for  parallel  tasks  by  defining 
ffdbf  for  parallel  tasks.  Eig.  2  shows  this  schedulability 
test  (see  Eq.  (6))  for  parallel  tasks  on  a  homogeneous 
multiprocessor  [17]  comprising  m  processors.  See  (4).  It 
shows  the  definition  of  ffdbf.  We  define  ffdbf  using  WJ 
where  WJ  should  be  read  as:  Work-performed-by-a-single- 
Job-of-task-Ti.  WJ  is  defined  using  WJS  where  WJS  should 
be  read  as:  Work-performed-by-Stage-j-and-later-of-a-single- 
Job-of-task-Ti.  WJS  is  defined  using  bspj^  and  spjj  where 
bspj  j  should  be  read  as:  the-SPan-of-time-during-which- 
stage-j-of-task-Ti-can-keep-all-processors-Busy-for-the-case- 
that-there-is-no-contention-from-other-jobs  and  spj  ^  should 
be  read  as:  the-SPan-of-time-during-which-stage-j-of-task-Ti- 
can-keep-at-least-one-processor-busy-for-the-case-that-there- 


is-no-contention-from-other-jobs.  Ci  denotes  an  upper  bound 
on  the  total  execution  requirement  of  a  job  of  r^,  that  is,  if  a 
job  of  Ti  executes  with  no  contention  from  other  jobs  on  a 
computer  system  with  a  single  processor  then  it  takes  Ci/s 
time  units  to  finish,  rji  denotes  an  upper  bound  on  the  number 
of  elapsed  units  of  execution  performed  by  a  job  of  task 
Ti,  that  is,  if  a  job  of  Ti  executes  with  no  contention  from 
other  jobs  on  a  computer  system  with  an  infinite  number  of 
processors  then  it  takes  rji/s  time  units  to  finish. 

Pig.  2  also  shows  a  feasibility  test  (see  Eq.  (7))  for  parallel 
tasks  on  a  homogeneous  multiprocessor.  Since  this  formulation 
is  for  homogeneous  multiprocessors,  we  do  not  have  the  ^  and 
^  on  Cij.  Basic  properties  of  ffdbf  are  shown  below. 

Lemma  1.  Vfo  >  0,Vf  >  to  ■  ffdbf  (r^, fo, s)  < 

ffdbf  (Ti,  t,  V,  s) 

Proof:  Pollows  from  Eqs.  (l)-(4).  ■ 

Lemma  2.  V/  e  N  :  ffdbf  (rj,  f -I- (  x  u,  s)  = 

ffdbf  {ji,  t,v,s)  +  I  X  Ci 

Proof:  Pollows  from  Eq.  (4).  ■ 

Let  A(t),  in  Pig.  3,  be  a  duration  of  a  time  interval. 

Lemma  3.  ffdbf  (tj,  A(t),  v,  s)  3-  Ci  < 
ffdbf  (Ti,  2  X  A(t),  V,  s) 

Proof:  Applying  Lemma  2  with  t  =  A(t)  and  I  =  1, 
yields  ffdbf  (t*,  A(t),  v,  s)  +  C^  =  ffdbf  {n,  A(t)  +  Ti,  v,  s). 
Prom  the  definition  of  A(t),  it  follows  that  Ti  <  A(t). 
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Cl.  3cr^  s.t.  (Vri  S  r  s.t.  xj  =  1  :  ^  <  cr^)  A  (Vt  >  0  ;  (X]t-  st  (Ti,t,  s)  X  xj)  <  ((mi  —  (mi  —  l)x  X  t  X  s)) 

C2.  30-2  s.t.  (Vri  e  r  s.t.  r?  =  1  :  ^  <  cr^)  A  (Vt  >  0  ;  (I].riGT  C  V’®)  ^  -  ((™2  -  (m2  -  1)  x  X  t  X  s)) 

C3.  Vn  Gt  :  xj  +x^  =  1 

C4.  Vri  e  T  :  e  {0, 1}  and  x?  S  {0, 1} 


Fig.  4:  A  naive  formulation  of  constraints  for  task-to-processor-type  assignment. 


Applying  this  on  the  above  and  using  Lemma  1  yields 
ffdbf  (ri,  A(r),  u,  s)  +  Ci  <  ffdbf  {Ti,  2  x  A(t),  v,  s').  ■ 

Previous  works  [39],  [40]  have  approximated  the  demand- 
bound  function  and  used  it  for  schedulability  analysis  on  a 
single  processor  and  a  homogeneous  multiprocessor.  We  now 
define  a  function  ffdbf*  (an  approximation  of  ffdbf)  that  is 
fit  for  our  purpose.  For  inputs  where  t  is  at  most  A(r), 
ffdbf*  {Ti,t,v,  s,t)  is  a  staircase  function  and  for  t  greater 
than  A(t),  ffdbf*  (Ti,t,v,  s,t)  increases  linearly  with  t.  For¬ 
mally,  Eq.  (9)  shows  the  definition  of  ffdbf*  {Ti,t,v,  s,t). 

Lemma  4.  Vf  >  0  :  ffdbf  (n,  t,  v,  s)  <  ffdbf*  v,  s,  t) 

Proof:  See  Appendix.  ■ 

Definition  7.  TS{t,  9)  {f|(2L'°g2  =f)^  (DMIN  x(l- 

9)<t<A  (t))}  U  {2Liog2(DMiNx(i-e))j  | 

Lemma  5.  Vf  e  TS{t,6)  :  ffdbf*  (ri,f,v,s,T)  = 

ffdbf  (Ti,  2  X  t,  V,  s) 

Proof:  Follows  from  Eq.  (9)  and  Definition  7.  ■ 

IV.  New  Algorithm  and  its  Speedup  Eactor 

In  this  section,  we  discuss  scheduling  on  a  two-type  hetero¬ 
geneous  multiprocessor.  We  will  use  notations  in  Eig.  2  and 
Eig.  3  but  with  1  as  superscript;  this  superscript  indicates  that 
the  quantity  is  based  on  C} y  Ditto  for  type-2.  Eor  example, 

from  Eq.  (1)  we  obtain:  C}  (iisegij  x  Clj)  and 

Cf  (nsegij  X  C^). 

A.  Developing  the  new  algorithm 

The  problem  of  intra-migrative  scheduling  of  constrained- 
deadline  parallel  sporadic  tasks  on  a  two-type  heterogeneous 
multiprocessor  can  be  solved  in  two  steps.  Step  1:  Before  run¬ 
time,  assign  tasks  to  processor  types  so  that  (i)  tasks  assigned 
to  type-1  are  gEDE-schedulable  on  the  processors  of  type-1 
and  (ii)  tasks  assigned  to  type-2  are  gEDE-schedulable  on  the 
processors  of  type-2.  Step  2:  At  run-time,  schedule  all  tasks 
assigned  to  type-1  with  gEDE  on  processors  of  type-1  and 
schedule  all  tasks  assigned  to  type-2  with  gEDE  on  processors 
of  type-2.  Since  Step  2  is  trivial,  we  only  discuss  Step  1. 

Step  1  could  be  solved  as  follows.  Let  xj  =  1  indicate  that 
task  Ti  is  assigned  to  type-1  processors  and  let  xf  =  1  indicate 
that  task  Ti  is  assigned  to  type-2  processors.  Let  X  denote  the 
matrix  of  Xi  values  for  all  tasks  in  r.  Then,  by  using  Eq.  (6), 
one  could  solve  Step  1  by  assigning  values  to  Xi  variables  such 
that  all  the  constraints  in  Eig.  4  are  satisfied.  Intuitively,  Cl  in 
Eig.  4  states  that  according  to  the  schedulability  test  of  Eq.  (6), 
the  tasks  assigned  to  type-1  processors  are  gEDE-schedulable 


on  type-1  processors.  C2  is  analogous  for  type-2  processors. 
C3  combined  with  C4  states  that  a  task  is  either  assigned 
to  type-1  or  type-2.  C4  states  that  Xi-variables  are  integers. 
Unfortunately,  creating  an  algorithm  that  assigns  values  to  X 
such  that  all  the  constraints  in  Eig.  4  are  satisfied  is  challenging 
because  (i)  it  involves  an  exists-quantifier  (3(7^  in  Cl  and  3cr^ 
in  C2)  and  (ii)  it  involves  a  forall-quantifier  (Vf  in  Cl  and  Vf 
in  C2)  and  (iii)  it  has  integer  variables  (see  C4).  Hence,  we 
now  present  other  constraints  so  that  if  these  other  constraints 
are  satisfied  then  the  constraints  in  Eig.  4  are  satisfied  as  well. 

Let  9^  and  9‘^  be  non-negative  parameters  that  we  can 
choose.  Then,  instead  of  asking  if  there  exists  a  in  Cl 
in  Eig.  4  with  certain  properties,  let  us  only  consider  the  cr^ 
such  that  CT^/s  =  9^.  Then  it  follows  that  if  there  is  a  task 
with  xj  =  1  and  ^  >  9^  x  s(n)  then  Cl  is  violated.  Hence, 
if  9^  is  given  and  cr^/s  =  9^  and  jf  >  9^  x  s(n)  then  it  is 
necessary  that  xj  =  0.  We  can  reason  analogously  for  9^  and 
C2.  Eor  this  reason,  we  introduce  the  following  sets. 

1  2 

H12  ='  {Ti  e  r  I  (^  >  X  S(n))  A  (^  >  02  X  s(n))} 

i-xi  J-xi 

1  2 

HI  Ii'  {r,  e  r  I  (^  <  0l  X  s(n))  A  (^  >  02  X  s(n))} 

1  2 

H2  ‘^=  {Ti  e  T  I  (^  >  0i  X  s(n))  A  (^  <  02  X  s(n))} 

Di  Di 

1  2 

L  ='  (u  e  r  I  (^  <  0i  X  S(n))  A  (^  <  02  X  S(n))} 

Observe  that  t  =  H12UH1UH2UL.  Let  m2{9^ ,9^ ,t,U) 
denote  H12  for  the  parameters  9^,9'^,t,II.  Analogously  for 
H1,H2,  and  L. 

Clearly,  if  9^  and  9^  are  given  and  ct^/s  =  9^  and  cr^/s  = 
9^  and  there  is  a  task  in  H12  then  it  is  impossible  to  satisfy 
Eig.  4.  Also,  if  9^  is  given  and  ct^/s  =  9^  then  it  is  necessary 
that,  for  each  task  Ti  G  HI,  xj  =  1.  Analogously,  if  9^  is 
given  and  ct^/s  =  9^  then  it  is  necessary  that,  for  each  task 
Ti  €  H2,  xf  =  1.  This  yields  Eig.  5.  It  can  be  seen  that  if  6*^ 
and  6*2  are  given  and  X  satisfies  Eig.  5  then  X  satisfies  Eig.  4. 

Note  that  there  is  still  a  Vf  in  Cl  and  C2  in  Eig.  5.  We  now 
present  constraints  with  a  finite  number  of  f  —  see  Eig.  6. 

Lemma  6.  if  X  satisfies  Fig.  6  then  X  satisfies  Fig.  5. 

Proof:  Suppose  that  the  lemma  was  false.  Then  there 
exists  T, n, 0^,02^ X  such  that  X  satisfies  Eig.  6  and  X  does 
not  satisfy  Eig.  5.  Note  that  it  can  only  be  that  either  Cl  or 
C2  (or  both)  are  violated  in  Eig.  5. 

If  it  is  Cl  then  we  can  reason  as  follows:  There  must  be  a 
t  that  violated  Cl  in  Eig.  5.  Hence, 
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Cl. 

Vf  >  0  :  «)  X  a;))  <  ((mi 

-  (mi 

-1) 

X 

X 

X 

C2. 

Vt  >  0  :  «)  X  xj)  <  ((m2 

-  (m2 

-1) 

X 

X 

(M 

X 

C3. 

Vti  e  T  :  xl  +  x?  =  1 

Vti  e  t  :  x|  e  {0, 1}  and  xj  S  {0, 1} 

C4. 

C5. 

Vri  e  HI  :  xl  =  1 

Vri  &H2:xj  =  l 

C6. 

Cl. 

H12  =  0 

Fig.  5;  A  slightly  less  naive  formulation  of  constraints  for  task-to-processor-type  assignment. 


Cl. 

vers(T,ei) :  (Er.erMbf*^ 

(Ti,f,  ek 

s,  r) 

X  xj) 

< 

((mi  - 

-  (mi  - 

-1) 

X 

X 

X 

C2. 

vers(r,e2) :  (E^.g,  ffdbf*'' 

r) 

< 

((m2  - 

-  (m2  - 

-1) 

X 

to 

X 

X 

C3-C7. 

Same  as  C3-C7  in  Figure  5 

C8. 

(Erier(C',VTi)  X  xj)  <  ((mi 

-  (mi  - 

1)  X 

01)  X 

s) 

C9. 

(E.,6.(Cf/Ti)  X  xj)  <  ((m2 

-  (m2  - 

1)  X 

6)2)  X 

s) 

Fig.  6:  Constraints  for  task-to-processor-type  assignment. 


Cl. 

Vt  G  TS{t,  9^)  :  (J^T-er  ff<ibf*^(Tj,  t,  9^ ,  s,  r)  x  x^)  <  ((mi  —  (mi  —  1)  x  9^)  x  t  x 

s  X  1/2) 

C2. 

Vt  G  TS(t,  9“^)  :  (Xlr-er  ff<ibf*^(rj,  t,  0^,  s,  r)  x  xf)  <  ((m2  —  (m2  —  1)  X  9‘^)  x  t  x 

s  X  1/2) 

C3. 

Same  as  C3  in  Figure  6 

C4. 

Vrj  G  T  :  >  0  and  x^  >  0 

C5-C7. 

Same  as  C5-C7  in  Figure  6 

C8. 

^  ^  ((^1  “  (^1  -  1)  X  01)  X  S  X  1/2) 

C9. 

{T.TiGri^i/'^i)  X  ^?)  <  ((1112  -  (m2  -  1)  X  02)  X  S  X  1/2) 

Eig.  7:  Constraints  for  task-to-processor-type  assignment  — 

relaxed  to  LP. 

C1-C9. 

Same  as  C1-C9  in  Figure  6 

CIO. 

Vti  e  T  s.t.  ((x'j  =  1)  V  (x'j  =  1))  :  xj  =  x'j 

Cll. 

Vtj  e  T  s.t.  ((x'j  =  1)  V  (x'j  =  1))  :  xj  =  x'j 

C12. 

X'  is  the  solution  to  the  problem  in  Fig.  7  and  X'  is  a  vertex  solution 

Fig.  8:  Constraints  for  task-to-processor-type  assignment;  we  will  show  this  can  be  computed  in  pseudo-polynomial  time. 


^  fFdbf^  (tj,  0^,  s)  X  xj  j  >  (mi  —  (mi  —  1)  x  O^^xtxs  (10) 

For  this  t,  let  us  explore  three  possibilities: 

Case  1:  t  >  A(t).  Note  that  A(t)  is  an  element  in 
TS  (t,6»^).  Then,  Cl  in  Fig.  6  yields: 


^  ffdbf  (rj,  A(r),  0^,  s,  r)  X  < 
(mi  —  (mi  —  1)  X  0^)  X  A(t)  X  s 


(11) 


Then,  C8  in  Fig.  6  yields: 


(  y:  ^  X  xlj  <  (mi  -  (mi  -  1)  X  01)  X  s  (12) 

Multiplying  Eq.  (12)  by  {t  —  A(r))  and  adding  to  Eq.  (11) 
and  then  combining  with  Eq.  (10)  yields: 


Since  A(r)  €  TS  (r,  6*^),  Lemma  5  can  be  applied  on  the 
left-most  term.  Doing  so  and  then  applying  Lemma  3  yields: 

(  (fi'dbfl  (ri,  A(t),6»\s)  +  Ci  +  ^  x  (t  -  A(t)))  x  xj) 

<  (  ffdbf ^  (r^ ,  t,  0^ ,  s)  X  xj) 

Ti^T 

Note  that  the  left-hand  side  is  the  expression  of  ffdbf  (the 
second  case  of  Eq.  (9)).  Hence: 

y:  ffdbf*  (ri,  t,  0^ ,  s,  r)  X  xl  <  ffdbf^  (t^,  t,  ,  s)  X  xj 

Ti^T  Ti^T 

But  this  contradicts  Lemma  4. 

Case  2:  t  <  DMIN  x  (1  —  9^).  Eor  such  a  t,  it  holds  that 
ffdbf^  (ri,f, 0^,s)  is  zero.  But  this  violates  Eq.  (10). 

Case  3:  DMIN  x  (1  —  0^)  <  f  <  A(t).  Let  us  dehne  ti 
as  ti  =  and  let  to  be  ti/2.  It  is  easy  to  see  that 

to  S  t  <  ti.  Note  that  to  is  an  element  in  TS  (r,  0^).  This 
gives  us  from  Eig.  6: 


(XI  (r^,  A(t),6»\5,t)  +  ^  X  (t  -  A(r)))  x  xj) 

TiGr 

<  (  X!  ffdbf^  (tj,  t,  9^  X  x\) 

Ti^T 


y:  ffdbf  (Ti,to,  9^ ,  s,  r)  X  xj 


<  (mi  —  (mi  —  1)  X  (?^)xtoXs 
(13) 


Since  to  <  t,  it  clearly  holds  that: 
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(mi  —  (mi  —  1)  X  6^)  X  to  X  s  <  (mi  —  (mi  —  1)  X  0^)  X  tx  s  (14) 

Lemma  5  yields:  ffdbf*^  (ri,  to7  S;  t)  = 

ffdbf^  (tj,  ti,  s).  Combining  this  with  Eq.  (10),  Eq.  (13), 
and  Eq.  (14)  yields: 

ffdbf^  (Ti,ti  ,6^,s')  X  xj  <  ffdbf^  (Ti,  t,  ,  •5)  x  xj 

Ti^T  Tj  G  T 

Using  this  and  t  <  ti  and  Lemma  1  yields  a  contradiction. 

If  C2  is  violated  then  we  can  reason  analogously  to  the  case 
when  Cl  is  violated. 

It  can  be  seen  that  if  the  lemma  is  false  then  each  case 
results  in  contradiction.  Hence,  the  lemma  is  true.  ■ 

Note  that  in  Eig.  6,  there  is  a  finite  number  of  constraints 
and  this  is  what  we  want.  However,  the  X  variables  are 
integers  and  this  makes  the  problem  a  Mixed-Integer  Linear 
Program  (MILP);  the  research  literature  currently  neither  of¬ 
fers  a  polynomial  time  algorithm  for  solving  general  MILP 
nor  for  solving  MILP  with  the  special  structure  of  Pig.  6. 
Por  Linear  Programming  (LP),  polynomial  time  algorithms 
are  known  though  (see  [41],  for  example).  It  is  also  known 
that  if  there  is  a  solution  to  an  LP  then  there  is  also  a  vertex 
solution  to  the  LP.  We  now  discuss  how  to  exploit  this.  Pig.  7 
shows  an  LP;  it  differs  from  Pig.  6  in  that  X  variables  are  real 
numbers  instead  of  integers  and  it  is  also  more  constrained  — 
s/2  instead  of  s  in  C1,C2,C8  and  C9.  With  the  solution  to 
this  LP,  we  obtain  a  new  optimization  problem  —  see  Pig.  8. 
This  optimization  problem  is  as  follows.  Pirst,  we  solve  the 
LP  (specified  by  Pig.  7)  and  obtain  a  solution  X'  that  is  a 
vertex  solution.  With  this  solution  X',  we  consider  the  MILP 
(specified  by  Pig.  6)  and  require  that  for  those  i  such  that 
x'l  =  1  or  x''^  =  1,  the  value  of  xj  should  be  equal  to  x'j 
and  the  value  of  xj  should  be  equal  to  x'j.  There  may  be  some 
XiS  that  remain;  these  will  be  assigned  values  by  solving  an 
MILP  (as  specified  by  Pig.  8). 

Lemma  7.  If  9^  and  9^  are  given  and  X  satisfies  Fig.  8  then 
X  satisfies  Fig.  4. 

Proof:  Pollows  from  the  discussion  in  this  subsection.  ■ 

Hence,  solving  Pig.  8  yields  an  assignment  of  tasks  to 
processor  types. 

B.  Stating  the  new  algorithm 

We  let  solvePTMILP(r,  H,  9^,9^)  denote  a  function  which 
takes  as  input  a  taskset  r  and  a  computer  platform  H  and  9^ 
and  9^  and  returns  a  tuple  (/,  X)  where  /  is  a  boolean  and 
is  a  matrix  with  the  following  interpretation:  if  Pig.  8  is 
feasible  then  /  is  true  and  X  is  the  solution;  if  Pig.  8  is 
infeasible  then  /  is  false  and  X  is  undefined. 

Theorem  1.  If  {f,X)  =  solvePTMILP(r,  H,  6»2)  and  f 
is  true  and  tasks  are  assigned  to  processor  types  according 
to  the  X-matrix  and  tasks  are  scheduled  with  gEDF  on  each 
processor  type  then  all  deadlines  will  be  met  at  run-time. 


Algorithm  I:  An  algorithm  for  evaluating  the  function 
solvePTMILP(T,  U,9^,9^). 


Input  :  A  taskset  r  and  a  two-type  platform  11  and  and 
Output:  A  tuple  {/,  X)  where  /  is  a  boolean  and  X  is  a  matrix 

1  Partition  r  into  H12,  HI,  H2  and  L 

2  if  (H12  =  0)  then 

3  Solve  the  LP  in  Fig.  7  and  obtain  a  vertex  solution 

4  if  {this  LP  is  feasible)  then 

5  I  Let  X'  denote  this  solution. 

6  Let  F  denote  a  set  of  indices  of  tasks  in  L  such  that 
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8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32  else 

33 

34  end 


(Pi  1)  V  (x'j  1). 

Let  us  introduce  ^vhich  is  an  assignment  of  values  to 

the  (Ci-variables  whose  subscript  index  is  in  F;  this 
assignment  is  initialized  to  be  undefined. 

Let  us  introduce  a  local  variable  foundPTMILP  that  is 
boolean  and  initialize  it  to  false. 

foreach  assignment  of  0-1  to  the  Xi-variables  whose  subscript 
index  is  in  F  do 

Evaluate  if  Fig.  8  is  satisfied  for  this  assignment 
if  {the  above  evaluation  yields  true)  then 

Let  X  denote  the  assignment  of  0-1  to  the 
aii -variables  whose  subscript  index  is  in  F 
if  (foundPTMILP  =  false)  then 
Set  foundPTMILP  to  true 
Set  to  X 

end 

end 

end 

if  (foundPTMILP  =  true)  then 
Form  the  matrix  X  as  follows: 

Vf  G  F\  Assign  x\  and  x?  according  to 

\/i  G  L  \  F:  Assign  x^  and  x‘f  according  to  X' . 

\fi  G  Hl\  Assign  =  1  and  =  0 

Vi  G  H2\  Assign  x^  =  0  and  x^  =  1 
return  (true,  X) 
else 

I  return  (false,  X') 

end 

else 

I  return  (false,  X),  where  X  is  undefined. 

end 

return  (false,  JX),  where  X  is  undefined. 


Algorithm  2:  The  new  intra-migrative  task  assignment 
algorithm  for  two-type  heterogeneous  multiprocessors. 

Input  :  A  taskset  r  and  a  two-type  platform  H 

Output:  Assignment  of  tasks  to  processor  types  indicated  by  matrix  X 

1  (f.X)  :=  solvePTMILP(r,n,l/_R(n),l/fi(n)) 

2  if  (/  =  true)  then 

3  I  declare  SUCCESS  and  stop 

4  else 

5  I  declare  FAILURE  and  stop 

6  end 


We  now  define  a  term  used  in  the  rest  of  the  paper. 

Definition  8.  R{Il)  =  4  -|-  max  (l  —  1  —  ^) 

Algorithm  1  lists  the  pseudo-code  for  evaluating  the  func¬ 
tion  solvePTMILP(r,  n,  0^,  0^).  Algorithm  2  shows  how  to 
assign  tasks  to  processor  types. 


Proof:  Pollows  from  Lemma  7  and  the  fact  that  Eq.  (6) 
is  a  schedulability  test.  ■ 


Theorem  2.  If  Algorithm  2  declares  SUCCESS  and  tasks  are 
assigned  to  processor  types  according  to  the  X-matrix  and 
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tasks  are  scheduled  with  gEDF  on  each  processor  type  then 
all  deadlines  will  be  met  at  run-time. 

Proof:  Follows  from  Theorem  1.  ■ 

C.  Proving  the  time  complexity  of  the  new  algorithm 
Lemma  8.  |TS(t,6»1)|  =  [loga  oMiNia-ei) J  +  2  and 
|TS(r,  0^)1  =  [log2  DMiNx(i-e2)  J  +  2 

Proof:  Follows  from  the  definition  of  TS.  ■ 

Lemma  9.  After  line  6  of  Algorithm  1  has  executed,  it  holds 
that:  |F|  <  [log2  dminx(i-6ii) J  +  DMiNx(i-e2) J  + 

Proof:  In  the  LP  on  line  3  of  Algorithm  1,  there  are 
1)  2  X  |L|  variables 

L^°S2  DMINx(l-ei)  J  +  Llog2  DMINx(l-e2)  J  +  6  +  |Z/| 

constraints 

The  former  follows  from  the  fact  that  there  are  no  tasks  in 
H12  and  for  the  tasks  in  iJl  and  H2  there  are  no  X- variables 
(they  are  constants,  as  stated  by  C5  and  C6).  The  latter  can 
be  seen  from  the  following:  There  are  [log2  DMii^x^i-ei)  J 

2  constraints  of  Cl  and  [log2  dmii^x^i-s^)  J  ^  constraints 
of  C2  (follows  from  Lemma  8);  one  constraint  of  C8;  one 
constraint  of  C9;  \L\  constraints  of  C3. 

Let  X'  denote  the  solution  of  the  LP  in  Fig.  7.  Since  for  each 
vertex  solution  of  LP,  it  holds  that  the  number  of  non-zero  vari¬ 
ables  is  at  most  the  number  of  constraints,  it  follows  that  there 
are  at  most  [log2  +  Llog2  +  6  + 

\L\  non-zero  values  of  the  X'  variables.  For  each  task  in  L\F, 
it  holds  that  there  is  exactly  one  non-zero  value  in  X' .  For  each 
task  in  F,  it  holds  that  there  is  exactly  two  non-zero  values  in 
X' .  Hence,  there  are  lx  (|L|  —  |F|)-|-2x  |F|  non-zero  values 
of  the  X'  variables.  Consequently:  1  x  (|L|  —  |F|)  -|-2  x  |F|  < 
Uog2  DMiNia-9i)J  +  Llog2  dmI^^T^J +6+I-Z-I-  Rewriting 

yields:  |F|  <  [log2  +  Llog2  dm4^TT3^J  +6- 

This  states  the  lemma.  ■ 

Lemma  10.  The  number  of  iterations  of  the  for-loop  on  line  9 
of  Algorithm  1  is  at  most:  (™ax^dmax)2  ^ 

Proof:  The  number  of  iterations  is  at  most  2l^l.  Using 
Lemma  9  yields  that  the  number  of  iterations  is  at  most: 

DMINX  (1-^1)  DMINX(l-e^) 

Eq.  (8)  yields  A(t)  <  2  x  (TMAX  -f  UMAX).  Using  it  and 
and  rewriting  yields:  the  number  of  iterations  is  at  most 

|,  2x(TMAX  +  DMAX)  ,  2  X  (TMAX+DMAX)  , 

2^°®^  DMINx(l-0l)  ^  X  2^°^^  DMINx(1-6>2)  ^  X  2^ 

Further  rewriting  yields  the  lemma.  ■ 

Lemma  11.  The  time  complexity  of  Algorithm  1  is 

n  (r>nlii  (TMAX+DMAX\2  1  ^ 

u  \^poLy {  dmin  )  ^  )■ 

Proof:  Follows  from  the  facts  that  (i)  line  3  of  Algo¬ 
rithm  1  can  be  executed  in  polynomial  time  because  LPs 
can  be  solved  in  polynomial  time  [41]  and  a  solution  can  be 


converted  to  a  vertex  solution  in  polynomial  time  and  (ii)  the 
number  of  combinations  iterated  through  in  the  for-loop  of 
line  9  is  at  most  (™ax+dmax)2  ^  (follows 

from  Lemma  10).  ■ 

Theorem  3.  The  time  complexity  of  Algorithm  2  is 

o{poiv  +  (™SS-'“)'f 

Proof:  Since  Algorithm  2  calls  solvePTMILP  once 
with  9^  =  8“^  =  l/i?(n)  it  follows  that  (us¬ 
ing  Lemma  11)  the  time  complexity  of  Algorithm  2  is 

O  (onlv  +  (TMAX+DMAXt2  _ 1 _ \ 

serving  that  4  <  i?(n)  yields  that  the  time  complexity  of 
Algorithm  2  is  O  (^poly  -f  (  ■  ■ 

D.  Proving  the  speedup  factor  of  the  new  algorithm 

Lemma  12.  Consider  a  taskset  t  and  a  computer  platform 
n.  If  T  is  intra-migrative  feasible  on  H  then  there  exists  a 
matrix  X  such  that  all  constraints  in  Fig.  9  are  satisfied. 

Proof:  Follows  from  the  fact  that  Eq.  (7)  is  a  necessary 
condition  for  feasibility.  ■ 

Lemma  13.  Consider  a  taskset  r  and  a  computer  platform  H. 
If  T  is  intra-migrative  feasible  on  IIx  l/i?(n)  then  there  exists 
a  matrix  X  such  that  all  constraints  in  Fig.  10  are  satisfied. 

Proof:  Eollows  from  applying  Lemma  12  on  H  x 
(l/i?(n))  and  then  considering  f  oo  on  Cl  and  C2  yields 


C8  and  C9  respectively. 

■ 

Lemma  14.  VQ  >  0  : 

(Vt>0:  (E..e.ffdbfi 

VI 

mi  X  t  X 

Q) 

^  (Vf  e  TS'(t,6»1)  : 

(E..e.ffdbf^ 

< 

nil  X  t  X  Q  X  2) 

Proof:  See  Appendix. 

■ 

Lemma  15.  VQ  >  0  : 

(Vf>0:  (E..6.ffdbf2 

VI 

Cl 

m2  X  t  X 

Q) 

^  (Vf  e  TS{t,9^)  : 

(E..e.ffdbf^ 

{nfi,  0^ 

< 

m2  X  t  X  Q  X  2) 

Proof:  See  Appendix.  ■ 

Lemma  16.  Consider  a  taskset  t  and  a  computer  platform 
n.  If  X  satisfies  Fig.  10  then  X  satisfies  Fig.  11. 


Proof:  Eollows  from  applying  Lemma  14  on  Cl  in  Eig.  10 
and  applying  Lemma  15  on  C2  in  Eig.  10.  ■ 

Lemma  17.  Consider  a  taskset  r  and  a  computer  platform  H. 
If  r  is  intra-migrative  feasible  on  IIx  l/i?(n)  then  there  exists 
a  matrix  X  such  that  all  constraints  in  Fig.  11  are  satisfied. 

Proof:  Eollows  Lemma  13  and  Lemma  16.  ■ 

Lemma  18.  Consider  a  taskset  r  and  a  computer  platform  H. 
If  T  is  intra-migrative  feasible  on  IIx  l/i?(n)  then  there  exists 
a  matrix  X  such  that  all  constraints  in  Fig.  12  are  satisfied. 

Proof:  Algebraic  manipulations  of  i?(n)  yields: 
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Cl. 

Vt  >  0  ;  (E.,,6.rffdbfi  (ri,t,l/7?(n),s)  X  xj) 

<  mi  X  t  X  {s/R  (n)) 

C2. 

it  >  0  :  (Erier  (n,t,  1/R(n)  ,s)  X  X?J 

<  m2  X  t  X  {s/R  (n)) 

C3-C4. 

C5. 

C6. 

Same  as  C3-C4  in  Figure  9 

in  e  Hi(i/it  (n) ,  i/R  (n) ,  t)  :  U  =  i 

in  e  H2(l/R  (n) ,  l/R  (n)  ,r):xj  =  l 

C7. 

m2(i/R(n),i/P(n),T)  =  0 

C8. 

E,.6,(Ci/Ti)  X  xj  <  mix  (s/R(n)) 

C9. 

E,.6,(Cf/Ti)  xxj<m2X  (s/R(U)) 

Fig.  10:  Constraints  expressing  a  necessary  intra-migrative  feasibility  condition;  rewritten. 


Cl.  'it  e  TS(r,  l/ij  (n))  :  (Erierfi'dbf*^  (t,,  t,  1/R  (H) ,  s,  r)  X  xj)  <  mi  X  t  X  (s/fl;(n))  x  2 

C2.  it  e  TS(r,  1/iJ  (n))  :  f  EriSr  (r*,  t,  1/R  (H) ,  s,  r)  X  xA  <  m2  X  t  X  is/R(U))  X  2 

C3-C9.  Same  as  C3-C9  in  Figure  10 

Fig.  1 1 :  Constraints  expressing  a  necessary  intra-migrative  feasibility  condition;  rewritten  further. 


Cl.  it  e  TS(r,  l/ij  (n))  :  (EriSrfi'dbf*^  {n,t,l/R  (U) ,  s,t)  x  xj)  <  (mi  -  (mi  -  1)  X  l//?(n))  X  t  X  s  X  1/2 

C2.  it  e  TS(r,  1/iJ  (n))  :  f  EriSr  (n,  t,  l/i?  (H) ,  s,  r)  X  xf)  <  (m2  -  (m2  -  1)  X  l//?(n))  X  t  X  s  X  1/2 

C3-C7.  Same  as  C3-C7  in  Figure  11 

C8.  Er.6,(Ci /E)  X  xj  <  ((mi  -  (mi  -  1)  x  l/i?(n))  x  s  x  1/2) 

C9.  Er’er(C?/E)  xxj<  ((m2  -  (m2  -  1)  x  l/i?(n))  x  s  x  1/2) 

Fig.  12:  Constraints  expressing  a  necessary  intra-migrative  feasibility  condition;  rewritten  even  more. 


C1-C3.  Same  as  C1-C3  in  Figure  12 
C4.  Vrj  G  T  :  xj  >  0  and  xf  >  0 

C5-C9.  Same  as  C5-C9  in  Figure  12 


Fig.  13:  Constraints  expressing  a  necessary  intra-migrative  feasibility  condition;  rewritten  to  LR 


mi  X  t  X 

m2  X  t  X 


s  ,  mi  —  1 

R{U)  -  ^  R{U) 

s  ,  m2  —  1 

X  2  <  (m2  — 


ij(n) 

mi  X 

m2  X 


R(n) 


)  X  t  X 


)  X  t  X 


R(n) 

s 

W) 


.  -  mi  -  1 

<  ("ll - ; — —)  X 

—  ^  ^  DiTT\  ' 


<  (m2  — 


R(n) 
m2  —  1 


)x 


s 

2 

s 

2 

s 

2 

s 

2 


Proof:  Let  us  suppose  that  the  lemma  was  false.  Then 
there  is  a  taskset  r  and  a  computer  platform  If  such  that 


there  exists  a  matrix  X  such  that  all  constraints  in 

Fig.  12  are  satisfied  (15) 

and  Algorithm  2  declares  FAILURE. 

Relaxing  C4  in  Fig.  12  yields: 


there  exists  a  matrix  X  such  that  all  constraints  in 

Fig.  13  are  satisfied  (16) 


Hence,  if  X  satisfies  Fig.  11  then  it  also  satisfies  Fig.  12. 
Combining  this  with  Lemma  17  yields  the  lemma.  ■ 

Lemma  19.  Consider  a  taskset  r  and  a  computer  platform  H. 
If  there  exists  a  matrix  X  such  that  all  constraints  in  Fig.  12 
are  satisfied  then  Algorithm  2  declares  SUCCESS. 


Eq.  (15)  yields 

Hi2(i//?(n),i/i?(n),T)  =  0  (17) 

Consider  Eig.  7  with  6^  =  6“^  =  IjR  (H)  and  compare  with 
Eig.  13.  They  are  identical.  Hence: 


there  is  a  matrix  X  such  that  all  constraints 
in  Fig.  7  are  satisfied  for  6^  =  =  1/J?(n)  (18) 

From  the  statement  that  Algorithm  2  declares  FAILURE  it 
follows  that  Algorithm  1  returns  a  tuple  with  the  first  element 
in  the  tuple  being  false.  Let  us  explore  the  possibilities  at 
which  Algorithm  1  can  return  a  tuple  with  the  first  element  in 
the  tuple  being  false. 

Case  1.  Algorithm  1  returns  on  line  33. 

The  condition  of  the  case  yields  H12(l/i?  (If) ,  \/R  (If) ,  r)  ^ 
0.  But  this  contradicts  Eq.  (17). 

Case  2.  Algorithm  1  returns  on  line  30. 

Erom  the  condition  of  the  case,  it  follows  that 

there  exists  no  matrix  X  such  that  all  constraints 
in  Fig.  7  are  satisfied  for  8^  =  9^  =  l/i?(n) 

But  this  contradicts  Eq.  (18). 

Case  3.  Algorithm  1  returns  on  line  27. 

Erom  the  condition  of  the  case,  it  follows  that  foundPTMILP 
is  false  when  the  Algorithm  1  returns  on  line  27.  Let  us 
partition  r  into  F  and  t\  F.  Note  that  for  t  \  F  it  holds 
that  X'  satisfies  Pig.  7  and  since  this  set  of  tasks  have  x]  and 
xf  being  integers  (follows  from  the  fact  that  it  does  not  contain 
the  tasks  in  F),  it  follows  that  X'  also  satisfies  the  following 
constraints:  Pig.  8  where  in  the  expression  on  the  right-hand 
side  of  C1,C2,C8,C9,  the  symbol  s  is  replaced  by  s/2.  Note 
that  F  C  T  and  hence  from  Eq.  (15),  it  follows  that  for  F,  there 
is  an  X  that  satisfies  the  following  constraints:  Fig.  8  where 
in  the  expression  on  the  right-hand  side  of  C1,C2,C8,C9,  the 
symbol  s  is  replaced  by  s/2.  Adding  X'  and  X  gives  us  a  new 
matrix  that  satisfies  Fig.  8  and  this  yields  that  foundPTMILP 
is  true.  This  is  a  contradiction. 

It  can  be  seen  that  if  the  lemma  is  false  then  each  case 
results  in  contradiction.  Hence,  the  lemma  is  true.  ■ 

Theorem  4.  Consider  a  taskset  r  and  a  computer  platform 
n.  If  T  is  intra-migrative  feasible  on  H  x  l/i?(n)  then 
Algorithm  2  declares  SUCCESS. 

Proof:  Follows  from  Lemma  18  and  Lemma  19.  ■ 

Theorem  5.  Algorithm  2  has  speedup  factor  i?(n)  <  5. 

Proof:  Follows  from  Theorem  4  and  Definition  8.  ■ 

V.  Evaluation 

Recall  (from  Theorem  3)  that  the  time-complexity  of  our 
new  algorithm  (Algorithm  2)  is  pseudo-polynomial  and  this 
gives  us  knowledge  about  its  worst-case  running  time.  We 
would  like,  however,  to  also  know  its  average-case  running 
time.  Therefore,  we  developed  a  tool  that  performs  the  as¬ 
signment  as  specified  by  Algorithm  2  and  ran  experiments  on 
randomly  generated  tasksets  and  measured  the  running  time. 


mi 

m2 

kl 

TR 

DTR 

maxt 

1 

16 

16 

0.0100 

1.0000 

0.001 

1 

16 

16 

0.0100 

0.0100 

0.005 

1 

16 

256 

0.0100 

1.0000 

0.026 

1 

16 

256 

0.0100 

0.0100 

0.053 

1 

256 

16 

0.0100 

1.0000 

0.001 

1 

256 

16 

0.0100 

0.0100 

0.006 

1 

256 

256 

0.0100 

1.0000 

0.017 

1 

256 

256 

0.0100 

0.0100 

0.080 

16 

16 

16 

0.0100 

1.0000 

0.001 

16 

16 

16 

0.0100 

0.0100 

0.004 

16 

16 

256 

0.0100 

1.0000 

0.026 

16 

16 

256 

0.0100 

0.0100 

0.079 

16 

256 

16 

0.0100 

1.0000 

0.001 

16 

256 

16 

0.0100 

0.0100 

0.005 

16 

256 

256 

0.0100 

1.0000 

0.018 

16 

256 

256 

0.0100 

0.0100 

0.055 

256 

256 

16 

0.0100 

1.0000 

0.001 

256 

256 

16 

0.0100 

0.0100 

0.006 

256 

256 

256 

0.0100 

1.0000 

0.001 

256 

256 

256 

0.0100 

0.0100 

0.050 

Pig.  14:  Execution  time  of  our  new  algorithm. 


We  generated  systems  as  follows.  We  let  mi  be  in  the 
set  {1,16,256}  and  m2  be  in  the  set  {1,16,256}  and  |r| 
be  in  the  set  {1, 16,256}.  We  let  TR  be  a  parameter  which 
specifies  that  TR  <  ^  <  1  and  DTR  specify  that  for 

each  task  r^,  it  holds  that  Di  G  [DTR  x  Ti,Ti].  We  let  TR 
be  0.01  and  DTR  be  in  the  set  {0.01, 1}.  All  combinations 
of  these  were  explored  and  6  tasksets  for  each  combination 
was  generated.  The  number  of  stages  of  a  task  is  a  uni¬ 
formly  distributed  random  number  in  {L.5}  and  the  number 
of  segments  of  a  stage  is  a  uniformly  distributed  random 
number  in  {1..  max(mi,  m2)}.  Recall  that  Eq.  (7)  expresses  a 
necessary  feasibility  test  for  parallel  tasks  on  a  homogeneous 
multiprocessor.  This  can  be  used  to  obtain  an  MILP  which 
is  a  necessary  condition  for  intra-migrative  feasibility  on  a 
two-type  heterogeneous  platform.  We  scale  the  execution  time 
parameters  so  that  this  condition  is  true  but  increasing  all 
execution  times  by  an  arbitrarily  small  number  makes  the 
condition  false.  The  intuition  behind  this  scaling  is  that  it 
makes  the  tasksets  as  challenging  as  possible.  Generating 
tasksets  this  way  is  very  time-consuming  though;  this  is  the 
reason  why  we  only  generated  6  tasksets  per  combination. 

Por  each  combination,  the  maximum  time  required  by  the 
algorithm  was  obtained  and  this  is  refered  to  as  maxt;  it  is 
measured  in  seconds.  The  results  are  shown  Pig.  14. 

It  can  be  seen  that  for  these  combinations,  the  algorithm 
required  at  most  0.079  seconds.  We  have  conducted  more 
extensive  experiments  with  a  wider  variation  of  parameters 
and  found  that  for  these  experiments,  the  algorithm  required 
at  most  2.44  seconds.  See  [42]  for  details  and  also  see  [42] 
for  an  algorithm  that  improves  the  performance  further. 
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VI.  Conclusions 

Scheduling  real-time  tasks  on  a  heterogeneous  multiproces¬ 
sor  has  received  increasing  attention  from  researchers  during 
recent  years  but  no  solution  was  available  for  parallel  tasks 
with  a  proven  speedup  factor.  Therefore,  in  this  paper,  we  have 
presented  the  first  algorithm  for  scheduling  parallel  tasks  on  a 
heterogeneous  multiprocessor  with  proven  speedup  factor.  We 
did  so  by  focusing  on  constrained-deadline  sporadic  tasks  and 
a  heterogeneous  multiprocessor  where  processors  are  of  two 
types  and  we  presented  a  new  algorithm  that  assigns  tasks  to 
processor  types  and  then  apply  global-Earliest-Deadline-First 
on  each  type  of  processors.  Our  new  algorithm  has  pseudo¬ 
polynomial  time  complexity  and  a  speedup  factor  at  most  5. 
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Appendix 

A.  Proof  of  Lemma  4 

In  this  section,  we  prove  Lemma  4  and  we  do  this  incre¬ 
mentally,  i.e.,  by  proving  some  basic  results  and  then  merging 
them  to  obtain  the  desired  result. 

Lemma  20.  Vi  >  0,i  <  A(t)  :  ffdbf  (r^,  i,  u,  s)  < 
ffdbf  (t„2Li°S2 ‘J+i,u,s) 

Proof:  Follows  from  Lemma  1  (monotonicity)  and  ob¬ 
serving  that  t  <  2l^'°S2  L-i-i,  ■ 

Lemma  21.  Vi  >  A(t)  :  ffdbf  (r^,  i,  u,  s)  < 
ffdbf  (Ti,  A(t),u,s)  +  {Ci  +  ^^  X  {t-  A(r))^ 

Proof:  Algebraic  manipulations  yield; 


t  =  A(t) -p  (t- A(t)) 
t-  A{t) 


=  A(t)  -P 

<  A(t)  -P 

<  A(t)  -P 


.-AM 

.Am 

Ti 


X  Ti  {t  —  A(t))  mod  Ti 
xTi  +  Ti 
+  l]  X  T, 


+  1  xTi,v,s 


t  -  A(t) 

Ti 


Using  this  on  Lemma  1  yields: 

ffdbf  (Ti,  i,  ti,  s)  <  ffdbf  ^Ti,  A(t)  -P 
Using  Lemma  2  yields: 

ffdbf  (ri,  t,  V,  s)  <  ffdbf  (ri,  A(r),  .?)  -P  ^ 

Relaxing  the  bound  on  the  right-hand  side  and  rewriting  yields: 


t  -  A(t) 


+  1  I  X  Ci 


n. 

ffdbf  V,  s)  <  ffdbf  (Ti,  A(t),  V,  s)  -P  (f  —  A(t))  x  — -  +  Ci 

Ti 

This  states  the  lemma.  ■ 

We  now  restate  Lemma  4  and  prove  it. 

Lemma  4.  ffdbf  (tj,  i,  u,  s)  <  ffdbf*  {ti,  i,  v,  s,  t) 


Proof:  We  need  to  consider  two  cases. 

Case  1.  i  <  A(r):  For  this  case  Lemma  20  along  with  the 
definition  of  ffdbf*  in  Eq.  (9)  proves  the  lemma. 

Case  2.  i  >  A(r):  For  this  case  Lemma  21  along  with  the 
definition  of  ffdbf*  in  Eq.  (9)  proves  the  lemma.  ■ 


B.  Proof  of  Lemma  14  and  Lemma  15 

In  this  section,  we  prove  Lemma  14  and  Lemma  15  and  once 
again  we  do  this  incrementally,  i.e.,  by  proving  some  basic 
results  and  then  merging  them  to  obtain  the  desired  result. 

Lemma  22.  Vf  G  (A(t),  A(t) -P  T^]  :  ffdbf*  (ri,f,v,s,T)  < 
ffdbf  {Ti,  t,v,s)  +  2  X  Ci 

Proof:  Since  t  >  A(r)  and  because  of  Lemma  1 
(monotonicity),  we  have: 

ffdbf  (n,  A(t),  ti,  s)  <  ffdbf  (Ti,  f,  V,  s)  (19) 


Rewriting  yields: 

Q. 

ffdbf  (tj,  A(r),  V,  5)  +  (7*  H - -  X  Ti  <  ffdbf  (r^,  s)  2  x  Ci  (20) 
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From  t  €  (A(r),  A(t)  +  T^],  we  obtain  that:  t  —  A(t)  <  T^.  Since  there  exists  a  i  in  TS{t,  9^)  such  that  the  last  constraint 
Applying  this  on  Eq.  (20)  yields:  \/t  G  (A(r),  A(r)  +  Ti]  :  is  true,  let  us  choose  one  of  them  and  call  it  to.  This  gives  us: 


ffdbf  {Ti,  A{t),v,  s)  +  Ci  H X  {t  —  A(r))  <  ffdbf  (Ti,t,v,s)  +  2xCi 

(21) 

Using  the  definition  of  ffdbf*  yields:  Vf  G  (A(r),  A(r)  +  Ti]  : 

ffdbf  (r^,  t,  V,  s,  t)  <  ffdbf  (r^,  v,  s)  2  X  Ci 

Hence  the  proof.  ■ 

Lemma  23.  Vf  >  A(r)  :  ffdbf*  (tj,  f,  u,  s,  r)  < 
ffdbf  {Ti,t,  V,  s)  +  2  X  Ci 

Proof:  We  prove  this  by  contradiction.  If  the  lemma  was 
false  then  there  exist  a  t  such  that  t  >  A(r)  and 

ffdbf  {ri^  t,  V,  s,  t)  >  ffdbf  (r^,  t,v,  s)  2  X  Ci 

If  f  >  A(r)  +  Ti  then  decreasing  t  by  Ti  decreases  the  left- 
hand  side  and  the  right-hand  side  of  the  above  inequality  by 
the  same  amount  (Ci).  Hence,  we  can  decrease  t  by  Ti  until  it 
holds  that  t  G  (A(t),  A(t)  -I-  Ti].  And  this  gives  us  that  there 
exists  a  t  such  that  t  G  (A(t),  A(t)  -I-  Ti]  and 

ffdbf  (Ti,  t,  V,  s,  t)  >  ffdbf  (Ti,  t,v,s)  +  2  X  Ci 

But  this  contradicts  Lemma  22  and  hence  it  is  not  possible 
that  lemma  under  discussion  is  false.  Hence  the  proof.  ■ 

Lemma  24.  Vf  >  A(t)  :  ffdbf*  (tj,  f,  u,  s,  r)  < 
ffdbf  {Ti,t,  v,s)  X  2 

Proof:  Since  t  >  A(t),  it  follows  that  t  >  TMAX  + 
DMAX  and  then  it  follows  that: 

ffdbf  (ri,  t,  D,  s)  >  2  X  Ci  (22) 

Using  Lemma  23  and  Eq.  (22)  yields:  Vf  >  A(r)  : 


ffdbf*  (rj,  t,  LI,  s,  r) 

< 

ffdbf  (rj ,  t,  11,  s)  +  2  X  Ci 

ffdbf  (Tj,  f,  V,  5) 

ffdbf  (tj,  a  h,  s) 

< 

^  2  xCi 

ffdbf  (Ti,t,  V,  s) 

< 

2  X  Ci 

1+  *  =  2 
r»  .  , 

(23) 

Rewriting  yields:  V<  >  A(t)  : 

ffdbf  (Ti,t,  V,  s,  t)  <  ffdbf  (ri,  f,  n,  s)  X  2 

This  states  the  lemma.  ■ 

Lemma  14.  VQ  >  0  : 

(Vf  >  0  :  (X^Tier  ffdbf^  (Ti,t,9\s))  <  mi  x  t  x  Q)  ^ 

(Vt  G  TS(t,9^)  :  ffdbf* ^  (Ti,f,6<\s,T)^  <  mi  x 

t  X  Q  X  2) 

Proof:  Suppose  that  the  lemma  was  false.  Then  it  holds 
that  there  exists  a  Q  such  that  Q  >0  and 


Vi  >  0  ;  ^  ffdbf  ^  j  ^  ^  x  Q  j  ^ 

E  ffdbf  (ri,  to,  >  mi  X  to  X  Q  x  2^ 


(24) 


Let  us  consider  two  cases: 

Case  1:  to  <  A(t).  Applying  2  x  fo  on  the  1st  constraint 
in  Eq.  (24)  yields: 


ffdbf ^  (ri,  2  X  to,  6^,  s)  <  mi  X  2  X  to  X  Q  j  ^ 


^X^  ffdbf* ^  (ri,to,  >  mi  X  to  X  Q  X  2^ 


Since  to  G  TSir,  6^),  it  follows  from  the  definition  of  ffdbf*  ^ 

that  ffdbf*^  (ti,  fo,  s,  f)  =  ffdbf^  (ti,  2  x  fo,  s)-  Ap¬ 
plying  this  on  the  last  constraint  yields: 


X^  ffdbf^  (ri,  2  X  to,  6^,  s)  J  <  mi  X  2  X  to  X  Q j 

E  ffdbf ^  (ri,  2  X  to,  0^,  >  mi  X  to  X  Q  X  2 


This  is  a  contradiction.  End  of  Case  1. 

Case  2:  to  >  A(t).  Applying  Lemma  24  on  the  last 
constraint  in  Eq.  (24)  yields: 


V  >  0  :  j  X^  ffdbf^  {'ri,t,  0^  ,s)  <  mi  X  t  X  Q  j  /'y 


X^  ffdbf ^  (ri,  to,  0^ ,  s)  X  2  >  mi  X  to  X  Q  X  2 


Dividing  the  last  constraint  by  2  yields: 


'^t  >  0  :  j  X^  ffdbf ^  (ri,t,  8^ ,  s)  j  <  mi  X  t  X  Q  j 

E  ffdbf  ^  (Ti,to,  8^ ,  s)  >  mi  X  to  X  Q 


This  is  a  contradiction.  End  of  Case  2. 

It  can  be  seen  that  if  the  lemma  is  false  then  for  each  case, 
we  obtain  a  contradiction.  Hence,  the  lemma  is  true.  ■ 


Vt  >  0  :  E  ffdbf^  (r^,  t,9^  j  <  mi  x  t  x  Q 

{  \neT  J 

A 

^3t  e  T5(r,0l)  :  X^  ffdbf*^  (ri,  t,  s,  r)^  >  mi  X  t  X  Q  X  2  j 


Lemma  15.  VQ  >  0  : 

(Vf  >  0  :  (Xrier  ffdbf^  (r*,  f,  s))  <  m2  x  t  x  Q)  ^ 

(yt  G  TS{t,9'^)  :  (^Xr.erffdbf*^  <  m2  x  t  x 

Q  x2) 

Proof:  Analogous  to  the  proof  of  Lemma  14.  ■ 
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Algorithm  3:  Another  new  intra-migrative  task  assignment 
algorithm  for  two-type  heterogeneous  multiprocessors. 


Input  :  A  taskset  r  and  a  two-type  platform  IT 

Output:  Assignment  of  tasks  to  processor  types  indicated  by  matrix  X 

1  K  :=  A 

2  foreach  ki  G  {0..K  —  1}  do 

3  I  foreach  k2  G  {0..K  —  1}  do 

1 _ ^ 

0^  •-  ^  _|_  -^(n)  w  u 

•—  Rmf  ^  FT—i  ^ 


5 

6 

7 

8 
9 

10 

11 

12 


O2 


1 

R(U) 

1 

R(U) 


-R(n) 
K-1 

1 _ 2_ 

^  Rjn) 


X  k2 


{f,X)  :=  solvePTMILP(r,n,6/i,6>2) 

if  (/  =  true)  then 
I  declare  SUCCESS  and  stop 

end 

end 

end 

declare  FAILURE  and  stop 


C.  Better  average-case  performance 


So  far,  we  have  discussed  how  to  create  an  algorithm  with 
a  proven  speedup  factor  and  with  pseudo  polynomial  time- 
complexity  and  Algorithm  2  offers  this.  But  Algorithm  2  uses 
9^  =  6“^  =  l/i?(n)  and  can  result  in  poor  performance 
for  some  tasksets  —  one  such  taskset  is  the  following: 
mi  =  m2  =  oo  and  r  =  {ti}  and  Ti  =  1  and  nsi  =  1 
and  nseg]^  =  1  and  Ci  i  =  0.21  and  s  =  1.  We  can 
improve  the  performance  by  also  exploring  other  values  of 
9^  and  9"^.  Note  that  doing  so  gives  us  an  algorithm  that  will 
succeed  whenever  Algorithm  2  succeeds  but  may  suceed  also 
in  cases  when  Algorithm  2  fails.  Also,  note  that  as  long  as 
we  choose  a  finite  number  of  combinations  of  9^  and  0^  and 
as  long  as  for  choices  of  9^  and  9^  it  holds  that  9^  <  1  and 
9^  <  1  then  the  resulting  algorithm  has  pseudo-polynomial 
time-complexity  (can  be  seen  from  Lemma  11).  Hence,  we 
introduce  Algorithm  3  based  on  this  idea. 


D.  Extensive  evaluation 


Recall  that  Fig.  14  presented  an  evaluation  of  Algorithm  2 
but  only  for  a  small  number  of  configurations  and  it  did  not 
evaluate  Algorithm  3.  Therefore,  in  this  section,  we  perform 
an  evaluation  of  running  time  with  more  configurations  and 
also  include  Algorithm  3.  The  results  are  shown  below.  We 
also  report  the  sucess  ratio  (SR)  for  each  configuration  of 
experiments.  (We  define  success  ratio  of  a  configuration  as 
the  number  of  tasksets  for  which  the  algorithm  succeeded  in 
finding  a  task  assignment  divided  by  the  number  of  tasksets 
evaluated  for  this  configuration.)  It  can  be  seen  that  Algo¬ 
rithm  3  offers  a  significant  performance  improvement  (in  terms 
of  success  ratio)  at  a  minor  increase  in  running  time.  The 
maximum  time  is  5.5  seconds. 
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